Fixed effects model

In econometrics and statistics, a fixed effects model is a statistical model that represents the observed quantities in terms of explanatory variables that are treated as if the quantities were non-random. This is in contrast to random effects models and mixed models in which either all or some of the explanatory variables are treated as if they arise from the random causes. Often the same structure of model, which is usually a linear regression model, can be treated as any of the three types depending on the analyst's viewpoint, although there may be a natural choice in any given situation.

In panel data analysis, the term fixed effects estimator (also known as the within estimator) is used to refer to an estimator for the coefficients in the regression model. If we assume fixed effects, we impose time independent effects for each entity that are possibly correlated with the regressors.

1 Qualitative description
2 Quantitative description
3 Equality of Fixed Effects (FE) and First Differences (FD) estimators when T=2
4 Hausman–Taylor method
5 Testing FE vs. RE
6 Steps in Fixed Effects Model for sample data
7 See also
8 References
9 External links

Qualitative description

Such models assist in controlling for unobserved heterogeneity when this heterogeneity is constant over time and correlated with independent variables. This constant can be removed from the data through differencing, for example by taking a first difference which will remove any time invariant components of the model.

There are two common assumptions made about the individual specific effect, the random effects assumption and the fixed effects assumption. The random effects assumption (made in a random effects model) is that the individual specific effects are uncorrelated with the independent variables. The fixed effect assumption is that the individual specific effect is correlated with the independent variables. If the random effects assumption holds, the random effects model is more efficient than the fixed effects model. However, if this assumption does not hold (i.e., if the Durbin–Watson test fails), the random effects model is not consistent.

Quantitative description

Formally the model is

$y_{it}=\beta_{0}%2BX_{it}\beta%2BZ_{i}\gamma%2B\alpha_{i}%2Bu_{it},$

where $y_{it}$ is the dependent variable observed for individual $i$ at time $t,$ $X_{it}$ is the time-variant regressor, $Z_{i}$ is the time-invariant regressor, $\alpha_{i}$ is the unobserved individual effect, and $u_{it}$ is the error term. $\alpha_{i}$ could represent motivation, ability, genetics (micro data) or historical factors and institutional factors (country-level data).

The two main methods of dealing with $\alpha_{i}$ are to make the random effects or fixed effects assumption:

1. Random effects (RE): Assume $\alpha_{i}$ is independent of $X_{it},Z_{i}$ or $E(\alpha_{i}|X_{it},Z_{i})=0$ . (In some biostatistical applications, $X_{it},Z_{i}$ are predetermined, $\beta$ and $\gamma$ would be called the population effects or "fixed effects", and the individual effect or "random effect" $\alpha_{i}$ is often denoted $b_{i}$ .)

2. Fixed effects (FE): Assume $\alpha_{i}$ is not independent of $X_{it},Z_{i}$ . (There is no equivalent conceptualization in biostatistics; a predetermined $X_{it},Z_{i}$ cannot vary, and so cannot be probabilistically associated with the random $\alpha_{i}$ .)

To get rid of individual effect $\alpha_{i},$ a differencing or within transformation (time arranging) is applied to the data and then $\beta$ is estimated via Ordinary Least Squares (OLS). The most common differencing methods are:

1. Fixed effects (FE) model: $y_{it}-\overline{y_{i}}=\left(X_{it}-\overline{X_{i}}\right) \beta%2B\left( u_{it}-\overline{u_{i}}\right)$ where $\overline{X_{i}}=\frac{1}{T}\sum\limits_{t=1}^{T}X_{it}$ and $\overline{u_{i}}=\frac{1}{T}\sum\limits_{t=1}^{T}u_{it}$ .

$\qquad\hat{\beta}_{FE}=\left( \sum\limits_{i,t}^{I}\widehat{x}_{it}^{\prime }\widehat{x}_{it}\right) ^{-1}\sum\limits_{i,t}^{I}\widehat{x}_{it}^{\prime }\widehat{y}_{it}$

where $\widehat{x}_{it}=\left( X_{it}-\overline{X_{i}}\right)$ and $\widehat{y}_{it}=y_{it}-\overline{y_{i}}$

2. First difference (FD) model: $y_{it}-y_{it-1}=\left( X_{it} -X_{it-1}\right) \beta%2B\left( u_{it}-u_{it-1}\right)$

$\hat{\beta}_{FD}=\left( \sum\limits_{i,t}^{I}\widehat{x}_{it}^{\prime }\widehat{x}_{it}\right) ^{-1}\sum\limits_{i,t}^{I}\widehat{x}_{it}^{\prime }\widehat{y}_{it}$

where $\widehat{x}_{it}=\left( X_{it}-X_{it-1}\right)$ and $\widehat{y}_{it}=y_{it}-y_{it-1}$

3. Long difference (LD) model: $y_{it}-y_{i1}=\left( X_{it}-X_{i1}\right) \beta%2B\left( u_{it}-u_{i1}\right)$

$\hat{\beta}_{LD}=\left( \sum\limits_{i,t}^{I}\widehat{x}_{it}^{\prime }\widehat{x}_{it}\right) ^{-1}\sum\limits_{i,t}^{I}\widehat{x}_{it}^{\prime }\widehat{y}_{it}$

where $\widehat{x}_{it}=\left( X_{it}-X_{i1}\right)$ and $\widehat{y}_{it}=y_{it}-y_{i1}$

Another common approach to removing the individual effect is to add a dummy variable for each individual $i$ . This is numerically, but not computationally, equivalent to the fixed effect model and only works if $T,$ the number of time observations per individual, is much larger than the number of individuals in the panel.

A common misconception about fixed effect models is that it is impossible to estimate $\gamma,$ the coefficient on the time-invariant regressor. One can estimate $\gamma$ using Instrumental Variables techniques.

Let $\widehat{di}=\overline{y_{i}}-\overline{X_{i}}\beta=Z_{i}\gamma %2B\varphi_{(\alpha)}$

We can't use OLS to estimate $\gamma$ from this equation because $Z_{i}$ is correlated with $\alpha_{i}$ (i.e. there is a problem with endogeneity from our FE assumption). If there are available instruments one can use IV estimation to estimate $\gamma$ or use the Hausman–Taylor method.

Equality of Fixed Effects (FE) and First Differences (FD) estimators when T=2

The fixed effects estimator is:

${FE}_{T=2}= \left[ (x_{i1}-\bar x_{i}) (x_{i1}-\bar x_{i})' %2B (x_{i2}-\bar x_{i}) (x_{i2}-\bar x_{i})' \right]^{-1}\left[ (x_{i1}-\bar x_{i}) (y_{i1}-\bar y_{i}) %2B (x_{i2}-\bar x_{i}) (y_{i2}-\bar y_{i})\right]$

Since each $(x_{i1}-\bar x_{i})$ can be re-written as $(x_{i1}-\dfrac{x_{i1}%2Bx_{i2}}{2})=\dfrac{x_{i1}-x_{i2}}{2}$ , we'll re-write the line as:

${FE}_{T=2}= \left[\sum_{i=1}^{N} \dfrac{x_{i1}-x_{i2}}{2} \dfrac{x_{i1}-x_{i2}}{2} ' %2B \dfrac{x_{i2}-x_{i1}}{2} \dfrac{x_{i2}-x_{i1}}{2} ' \right]^{-1} \left[\sum_{i=1}^{N} \dfrac{x_{i1}-x_{i2}}{2} \dfrac{y_{i1}-y_{i2}}{2} %2B \dfrac{x_{i2}-x_{i1}}{2} \dfrac{y_{i2}-y_{i1}}{2} \right]$

$= \left[\sum_{i=1}^{N} 2 \dfrac{x_{i2}-x_{i1}}{2} \dfrac{x_{i2}-x_{i1}}{2} ' \right]^{-1} \left[\sum_{i=1}^{N} 2 \dfrac{x_{i2}-x_{i1}}{2} \dfrac{y_{i2}-y_{i1}}{2} \right]$

$= 2\left[\sum_{i=1}^{N} (x_{i2}-x_{i1})(x_{i2}-x_{i1})' \right]^{-1} \left[\sum_{i=1}^{N} \frac{1}{2} (x_{i2}-x_{i1})(y_{i2}-y_{i1}) \right]$

$= \left[\sum_{i=1}^{N} (x_{i2}-x_{i1})(x_{i2}-x_{i1})' \right]^{-1} \sum_{i=1}^{N} (x_{i2}-x_{i1})(y_{i2}-y_{i1}) ={FD}_{T=2}$

Thus the equality is established.

Hausman–Taylor method

Need to have more than one time-variant regressor ( $X$ ) and time-invariant regressor ( $Z$ ) and at least one $X$ and one $Z$ that are uncorrelated with $\alpha_{i}$ .

Partition the $X$ and $Z$ variables such that $\begin{array} [c]{c} X=[\underset{TN\times K1}{X_{1it}}\vdots\underset{TN\times K2}{X_{2it}}]\\ Z=[\underset{TN\times G1}{Z_{1it}}\vdots\underset{TN\times G2}{Z_{2it}}] \end{array}$ where $X_{1}$ and $Z_{1}$ are uncorrelated with $\alpha_{i}$ . Need $K1>G2$ .

Estimating $\gamma$ via OLS on $\widehat{di}=Z_{i}\gamma%2B\varphi_{it}$ using $X_1$ and $Z_1$ as instruments yields a consistent estimate.

Testing FE vs. RE

We can test whether a fixed or random effects model is appropriate using a Hausman test.

$H_{0}$ : $\alpha_{i}\perp X_{it},Z_{i}$

$H_{a}$ : $\alpha_{i}\not \perp X_{it},Z_{i}$

If $H_{0}$ is true, both $\widehat{\beta}_{RE}$ and $\widehat{\beta}_{FE}$ are consistent, but only $\widehat{\beta}_{RE}$ is efficient. If $H_{a}$ is true, $\widehat{\beta}_{FE}$ is consistent and $\widehat{\beta}_{RE}$ is not.

$\widehat{Q}=$ $\widehat{\beta}_{RE}-\widehat{\beta}_{FE}$

$\widehat{HT}=T\widehat{Q}^{\prime}[Var(\widehat{\beta}_{FE})-Var(\widehat {\beta}_{RE})]\widehat{Q}\sim\chi_{K}^{2}$ where $K=\dim(Q)$

The Hausman test is a specification test so a large test statistic might be indication that there might be Errors in Variables (EIV) or our model is misspecified. If the FE assumption is true, we should find that $\widehat {\beta}_{LD}\approx\widehat{\beta}_{FD}\approx\widehat{\beta}_{FE}$ .

A simple heuristic is that if $\left\vert \widehat{\beta}_{LD}\right\vert >\left\vert \widehat{\beta}_{FE}\right\vert >\left\vert \widehat{\beta} _{FD}\right\vert$ there could be EIV.

Steps in Fixed Effects Model for sample data

Calculate group and grand means
Calculate k=number of groups, n=number of observations per group, N=total number of observations (k x n)
Calculate SS-total (or total variance) as: (Each score - Grand mean)^2 then summed
Calculate SS-treat (or treatment effect) as: (Each group mean- Grand mean)^2 then summed x n
Calculate SS-error (or error effect) as (Each score - Its group mean)^2 then summed
Calculate df-total: N-1, df-treat: k-1 and df-error k(n-1)
Calculate Mean Square MS-treat: SS-treat/df-treat, then MS-error: SS-error/df-error
Calculate obtained f value: MS-treat/MS-error
Use F-table or probability function, to look up critical f value with a certain significance level
Conclude as to whether treatment effect significantly effects the variable of interest

References

Christensen, Ronald (2002). Plane Answers to Complex Questions: The Theory of Linear Models (Third ed.). New York: Springer. ISBN 0-387-95361-2.
"FAQ:What is the between estimator?". http://www.stata.com/support/faqs/stat/xt.html.
"FAQ: Fixed-, between-, and random-effects and xtreg". http://www.stata.com/support/faqs/stat/xtreg.html.